91 research outputs found

    Correcting for misclassification error in gross flows using double sampling: moment-based inference vs. likelihood-based inference

    No full text
    Gross flows are discrete longitudinal data that are defined as transition counts, between a finite number of states, from one point in time to another. We discuss the analysis of gross flows in the presence of misclassification error via double sampling methods. Traditionally, adjusted for misclassification error estimates are obtained using a moment-based estimator. We propose a likelihood-based approach that works by simultaneously modeling the true transition process and the misclassification error process within the context of a missing data problem. Monte-Carlo simulation results indicate that the maximumlikelihood estimator is more efficient than the moment-based estimator

    Robust small area prediction for counts

    Get PDF
    A new semiparametric approach to model-based small area prediction for counts is proposed and used for estimating the average number of visits to physicians for Health Districts in Central Italy. The proposed small area predictor can be viewed as an outlier robust alternative to the more commonly used empirical plug-in predictor that is based on a Poisson generalized linear mixed model with Gaussian random effects. Results from the real data application and from a simulation experiment confirm that the proposed small area predictor has good robustness properties and in some cases can be more efficient than alternative small area approaches

    A Comparison of Methods for Poverty Estimation in Developing Countries

    Get PDF
    Small area estimation is a widely used indirect estimation technique for micro-level geographic profiling. Three unit level small area estimation techniques-the ELL or World Bank method, empirical best prediction (EBP) and M-quantile (MQ) - can estimate micro-level Foster, Greer, & Thorbecke (FGT) indicators: poverty incidence, gap and severity using both unit level survey and census data. However, they use different assumptions. The effects of using model-based unit level census data reconstructed from cross-tabulations and having no cluster level contextual variables for models are discussed, as are effects of small area and cluster level heterogeneity. A simulation-based comparison of ELL, EBP and MQ uses a model-based reconstruction of 2000/2001 data from Bangladesh and compares bias and mean square error. A three-level ELL method is applied for comparison with the standard two-level ELL that lacks a small area level component. An important finding is that the larger number of small areas for which ELL has been able to produce sufficiently accurate estimates in comparison with EBP and MQ has been driven more by the type of census data available or utilised than by the model per se

    Deliverable 2.3-Research needs in terms of statistical methodologies and new data

    Get PDF
    The MAKSWELL project was set up to help strengthening the use of evidence and information on well-being and sustainability for policy-making in the EU, as also the political attention to well-being and sustainability indicators has been increasing in recent years. Traditionally sample surveys are the data source used for measurement frameworks for well-being and sustainability. Over the last decades more and more new, alternative data sources become available. Examples are administrative data like tax registers, or other large data sets - so called big data - that are generated as a by-product of processes not directly related to statistical production purposes. In Deliverables 2.1, 2.2 as well as 3.1, 4.1 and 4.3 it is discussed in detail how these new data sources can be used in the production of official statistics and measurement frameworks for well-being and sustainability indicators. This Deliverable extends on the experiences obtained in these preceding deliverables by pointing out the needs for new data sources and methods in this context

    A novel framework for validating and applying standardized small area measurement strategies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Local measurements of health behaviors, diseases, and use of health services are critical inputs into local, state, and national decision-making. Small area measurement methods can deliver more precise and accurate local-level information than direct estimates from surveys or administrative records, where sample sizes are often too small to yield acceptable standard errors. However, small area measurement requires careful validation using approaches other than conventional statistical methods such as in-sample or cross-validation methods because they do not solve the problem of validating estimates in data-sparse domains.</p> <p>Methods</p> <p>A new general framework for small area estimation and validation is developed and applied to estimate Type 2 diabetes prevalence in US counties using data from the Behavioral Risk Factor Surveillance System (BRFSS). The framework combines the three conventional approaches to small area measurement: (1) pooling data across time by combining multiple survey years; (2) exploiting spatial correlation by including a spatial component; and (3) utilizing structured relationships between the outcome variable and domain-specific covariates to define four increasingly complex model types - coined the Naive, Geospatial, Covariate, and Full models. The validation framework uses direct estimates of prevalence in large domains as the gold standard and compares model estimates against it using (i) all available observations for the large domains and (ii) systematically reduced sample sizes obtained through random sampling with replacement. At each sampling level, the model is rerun repeatedly, and the validity of the model estimates from the four model types is then determined by calculating the (average) concordance correlation coefficient (CCC) and (average) root mean squared error (RMSE) against the gold standard. The CCC is closely related to the intraclass correlation coefficient and can be used when the units are organized in groups and when it is of interest to measure the agreement between units in the same group (e.g., counties). The RMSE is often used to measure the differences between values predicted by a model or an estimator and the actually observed values. It is a useful measure to capture the precision of the model or estimator.</p> <p>Results</p> <p>All model types have substantially higher CCC and lower RMSE than the direct, single-year BRFSS estimates. In addition, the inclusion of relevant domain-specific covariates generally improves predictive validity, especially at small sample sizes, and their leverage can be equivalent to a five- to tenfold increase in sample size.</p> <p>Conclusions</p> <p>Small area estimation of important health outcomes and risk factors can be improved using a systematic modeling and validation framework, which consistently outperformed single-year direct survey estimates and demonstrated the potential leverage of including relevant domain-specific covariates compared to pure measurement models. The proposed validation strategy can be applied to other disease outcomes and risk factors in the US as well as to resource-scarce situations, including low-income countries. These estimates are needed by public health officials to identify at-risk groups, to design targeted prevention and intervention programs, and to monitor and evaluate results over time.</p

    M-Quantile and expectile random effects regression for multilevel data

    No full text
    The analysis of hierarchically structured data is usually carried out by using random effects models. Theprimary goal of random effects regression is to model the expected value of the conditional distributionof an outcome variable given a set of explanatory variables while accounting for the dependence structureof hierarchical data. The expected value, however, may not offer a complete picture of this conditionaldistribution. In this paper we propose using linear M-quantile regression, to model other parts of theconditional distribution of the outcome variable given the covariates. The proposed random effectsregression model extends M-quantile regression and can be viewed as an alternative to the quantilerandom effects model. Inference for estimators of the fixed and random effects parameters is discussed.The performance of the proposed methods is evaluated in a series of simulation studies. Finally, wepresent a case study where M-quantile and expectile random effects regression is employed for analyzingrepeated measures data collected from a rotary pursuit tracking experiment

    Small area estimation via m-quantile geographically weighted regression

    No full text
    The effective use of spatial information, that is the geographic locations of population units, in a regression model-based approach to small area estimation is an important practical issue. One approach for incorporating such spatial information in a small area regression model is via Geographically Weighted Regression (GWR). In GWR the relationship between the outcome variable and the covariates is characterised by local rather than global parameters, where local is defined spatially. In this paper we investigate GWR-based small area estimation under the M-quantile modelling approach. In particular, we specify an M-quantile GWR model that is a local model for the M-quantiles of the conditional distribution of the outcome variable given the covariates. This model is then used to define a bias-robust predictor of the small area characteristic of interest that also accounts for spatial association in the data. An important spin-off from applying the M-quantile GWR small area model is that it can potentially offer more efficient synthetic estimation for out of sample areas. We demonstrate the usefulness of this framework through both model-based as well as design-based simulations, with the latter based on a realistic survey data set. The paper concludes with an illustrative application that focuses on estimation of average levels of Acid Neutralizing Capacity for lakes in the north-east of the USA.<br/
    • …
    corecore